Welcome to this lecture on the PageRank algorithm.
This is going to be the ultimate lecture of this course.
However, I think that the topic is a really nice topic to conclude the lecture series.
The PageRank algorithm is something which we encounter in our everyday life when we
use Google, for instance, and it's a special instance of a ranking algorithm.
And as you can already see from the buzzwords, we will again encounter graphs, and in particular
the graph-replacement operator that we've already seen in the last two lectures.
Okay, so this is basically the basic principle of the PageRank algorithm in a nutshell.
And this is the illustration that you can find on Wikipedia, but apparently quite appropriate,
this is why I show it here as well.
So what PageRank tries to do is the following.
You have a collection of websites, which are depicted by these smiling faces here.
And these websites, they possess links between each other.
And every time that a website links to another one, you draw an error, or in this case it's
an error with a finger pointing at the other website.
And what PageRank would like to find out is which of these websites are the most important
ones.
Because typically when you do a Google search, you will find these websites first, and then
the less important ones will be on the bottom of the search results.
Okay, but then of course the question arises, what is a good criterion for a website to
be important, let's say.
And looking at this picture, one could say that a website is important when many links
point towards it.
For instance, this yellow website here with the big smiley, this is deemed to be quite
important because there are many websites linking to it.
In fact, we have one, two, five, six, seven links to this website.
Okay, but of course this cannot be the only criterion.
For instance, if you look at this red website up here, this is also quite important, although
it just receives one link, namely from the yellow one.
However this link comes from a very important website, and so this website should also be
a bit more important.
And on the other hand, in contrary, these websites down here, they don't receive a single
link, they just link towards other websites, so they are quite unimportant.
Okay, but you can already see from this illustration that it's not immediately clear how to design
an algorithm which does the ranking, and just counting the links is maybe not the smartest
idea.
Okay, so before we start delving into the mathematics and the derivation of the PageRank
algorithm, let me give you a brief overview over the history of PageRank.
In fact, similar algorithms were first suggested already in the 17th by Pinsky and Narine in
their paper from 1976, and the concept there was very similar, however they applied the
ranking algorithm to citation rankings of scientific papers.
And here in the brackets I give you the citations that this paper received up to now according
to Google Scholar.
With this I would like to make clear which of these words that I will speak about have
had the greatest influence.
Okay, so this paper was cited 800 times, so it's quite important.
Then there is another version of a ranking algorithm which is very similar to PageRank,
and which was invented by Love and Slowman in 1995, so approximately 20 years later.
However, this one, this work only received 27 citations up to now, so not many people
seem to have looked into that.
Presenters
Leon Bungert
Zugänglich über
Offener Zugang
Dauer
00:43:10 Min
Aufnahmedatum
2021-06-17
Hochgeladen am
2021-06-17 16:27:02
Sprache
en-US